AITopics | auroc score

Collaborating Authors

auroc score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

b153f11554e8f7926189e3f21185f00f-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 09:56:27 GMT

dataset, neural network, variable importance, (14 more...)

Neural Information Processing Systems

Country:

Asia > Bangladesh (0.04)
Asia > Russia > Siberian Federal District > Krasnoyarsk Krai > Krasnoyarsk (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(5 more...)

Add feedback

LexiMark: Robust Watermarking via Lexical Substitutions to Enhance Membership Verification of an LLM's Textual Training Data

German, Eyal, Antebi, Sagiv, Habler, Edan, Shabtai, Asaf, Elovici, Yuval

arXiv.org Artificial IntelligenceOct-7-2025

Large language models (LLMs) can be trained or fine-tuned on data obtained without the owner's consent. Verifying whether a specific LLM was trained on particular data instances or an entire dataset is extremely challenging. Dataset watermarking addresses this by embedding identifiable modifications in training data to detect unauthorized use. However, existing methods often lack stealth, making them relatively easy to detect and remove. In light of these limitations, we propose LexiMark, a novel watermarking technique designed for text and documents, which embeds synonym substitutions for carefully selected high-entropy words. Our method aims to enhance an LLM's memorization capabilities on the watermarked text without altering the semantic integrity of the text. As a result, the watermark is difficult to detect, blending seamlessly into the text with no visible markers, and is resistant to removal due to its subtle, contextually appropriate substitutions that evade automated and manual detection. We evaluated our method using baseline datasets from recent studies and seven open-source models: LLaMA-1 7B, LLaMA-3 8B, Mistral 7B, Pythia 6.9B, as well as three smaller variants from the Pythia family (160M, 410M, and 1B). Our evaluation spans multiple training settings, including continued pretraining and fine-tuning scenarios. The results demonstrate significant improvements in AUROC scores compared to existing methods, underscoring our method's effectiveness in reliably verifying whether unauthorized watermarked data was used in LLM training.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.14474

Country:

Asia (1.00)
Europe (0.67)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bond-Centered Molecular Fingerprint Derivatives: A BBBP Dataset Study

Godin, Guillaume

arXiv.org Artificial IntelligenceOct-7-2025

A strong and fast baseline in molecular property prediction is a Random Forest (RF) trained on ECFP4/ECFP6 descriptors. In practice, the count-based variant of ECFP generally outperforms the binary variant, especially for classification. Recent deep-learning approaches can match or exceed these baselines, including pretrained transformer-CNN models (5) and graph neural networks such as ChemProp or AttentiveFP(6). Chemprop's key architectural choice is directed, bond-centered message passing, in contrast to the more common atom-centered formulations used by many MPNNs. Because much of the remaining architecture is comparable across message-passing GNNs, this raises a focused question: what concrete advantage does the bond-centered formulation confer over atom-centered approaches? To isolate this representational factor, we introduce a static Bond-Centered Fingerprint (BCFP) that mirrors Chemprop's bond-centric view, and we compare it directly against ECFP using a lightweight Random Forest or XGBoost pipeline on the Blood-Brain Barrier Penetration (BBBP) classification task. To our knowledge, this is the first study to propose BCFP and analyze its complementarity with ECFP (7) . Our results indicate that concatenating atom-and bond-centered fingerprints yields efficient and effective models for BBBP prediction, clarifying why bond-centric message passing often appears among top-k performers while offering a simple, fast alternative to full neural architectures.

artificial intelligence, bcfp, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2510.04837

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

TabINR: An Implicit Neural Representation Framework for Tabular Data Imputation

Ochs, Vincent, Bieder, Florentin, Hadramy, Sidaty el, Friedrich, Paul, Taha-Mehlitz, Stephanie, Taha, Anas, Cattin, Philippe C.

arXiv.org Artificial IntelligenceOct-2-2025

Tabular data builds the basis for a wide range of applications, yet real-world datasets are frequently incomplete due to collection errors, privacy restrictions, or sensor failures. As missing values degrade the performance or hinder the applicability of downstream models, and while simple imputing strategies tend to introduce bias or distort the underlying data distribution, we require imputers that provide high-quality imputations, are robust across dataset sizes and yield fast inference. INR, an auto-decoder based Implicit Neural Representation (INR) framework that models tables as neural functions. Building on recent advances in generalizable INRs, we introduce learnable row and feature embeddings that effectively deal with the discrete structure of tabular data and can be inferred from partial observations, enabling instance adaptive imputations without modifying the trained model. We evaluate our framework across a diverse range of twelve real-world datasets and multiple missingness mechanisms, demonstrating consistently strong imputation accuracy, mostly matching or outperforming classical (KNN, MICE, MissForest) and deep learning based models (GAIN, ReMasker), with the clearest gains on high-dimensional datasets. Tabular data remains one of the most common data formats across domains such as healthcare, finance, and the social sciences (Shwartz-Ziv & Armon, 2022). In these fields, missing values are ubiquitous and can severely degrade the performance of downstream machine learning models. Poor handling of missingness not only reduces predictive accuracy but may also lead to biased decisions, with real-world consequences for applications such as medical diagnostics or financial risk assessment. These challenges make robust imputation a critical step for trustworthy tabular learning and data-driven decision making (Rubin, 1976).

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.01136

Country:

Europe > Switzerland (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees (with Supplementary Material)

Neural Information Processing SystemsAug-17-2025, 20:03:47 GMT

Supplementary material is at the end of this document.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Asia > Bangladesh (0.04)
Asia > Russia > Siberian Federal District > Krasnoyarsk Krai > Krasnoyarsk (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(3 more...)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(5 more...)

Add feedback

Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs

German, Eyal, Antebi, Sagiv, Samira, Daniel, Shabtai, Asaf, Elovici, Yuval

arXiv.org Artificial IntelligenceJul-24-2025

Large language models (LLMs) are increasingly trained on tabular data, which, unlike unstructured text, often contains personally identifiable information (PII) in a highly structured and explicit format. As a result, privacy risks arise, since sensitive records can be inadvertently retained by the model and exposed through data extraction or membership inference attacks (MIAs). While existing MIA methods primarily target textual content, their efficacy and threat implications may differ when applied to structured data, due to its limited content, diverse data types, unique value distributions, and column-level semantics. In this paper, we present Tab-MIA, a benchmark dataset for evaluating MIAs on tabular data in LLMs and demonstrate how it can be used. Tab-MIA comprises five data collections, each represented in six different encoding formats. Using our Tab-MIA benchmark, we conduct the first evaluation of state-of-the-art MIA methods on LLMs finetuned with tabular data across multiple encoding formats. In the evaluation, we analyze the memorization behavior of pretrained LLMs on structured data derived from Wikipedia tables. Our findings show that LLMs memorize tabular data in ways that vary across encoding formats, making them susceptible to extraction via MIAs. Even when fine-tuned for as few as three epochs, models exhibit high vulnerability, with AUROC scores approaching 90% in most cases. Tab-MIA enables systematic evaluation of these risks and provides a foundation for developing privacy-preserving methods for tabular data in LLMs.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2507.17259

Country:

North America > United States (0.29)
Asia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Public Access in LLM Pre-Training Data

Rosenblat, Sruly, O'Reilly, Tim, Strauss, Ilan

arXiv.org Artificial IntelligenceMay-2-2025

Our AU-ROC scores show that GPT-4o, OpenAI's more recent and capable model, demonstrates strong recognition of paywalled O'Reilly book content (AUROC = 82%), compared to OpenAI's earlier model GPT-3.5 Turbo. In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O'Reilly book samples. GPT-4o Mini, as a much smaller model, shows no knowledge of public or non-public O'Reilly Media content when tested (AUROC 50%). Testing multiple models, with the same cutoff date, helps us account for potential language shifts over time that might bias our findings. These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training.

auroc score, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.35650/AIDP.4111.d.2025

2505.0002

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.94)
Government (0.93)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.59)

Add feedback

Beyond Glucose-Only Assessment: Advancing Nocturnal Hypoglycemia Prediction in Children with Type 1 Diabetes

Voegeli, Marco, Laguna, Sonia, Leutheuser, Heike, Pfister, Marc, Burckhardt, Marie-Anne, Vogt, Julia E

arXiv.org Artificial IntelligenceApr-15-2025

The dead-in-bed syndrome describes the sudden and unexplained death of young individuals with Type 1 Diabetes (T1D) without prior long-term complications. One leading hypothesis attributes this phenomenon to nocturnal hypoglycemia (NH), a dangerous drop in blood glucose during sleep. This study aims to improve NH prediction in children with T1D by leveraging physiological data and machine learning (ML) techniques. We analyze an in-house dataset collected from 16 children with T1D, integrating physiological metrics from wearable sensors. We explore predictive performance through feature engineering, model selection, architectures, and oversampling. To address data limitations, we apply transfer learning from a publicly available adult dataset. Our results achieve an AUROC of 0.75 +- 0.21 on the in-house dataset, further improving to 0.78 +- 0.05 with transfer learning. This research moves beyond glucose-only predictions by incorporating physiological parameters, showcasing the potential of ML to enhance NH detection and improve clinical decision-making for pediatric diabetes management.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2504.09299

Country: Europe > Switzerland (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

Bundele, Valay, Çal, Oğuz Ata, Kargi, Bora, Sarıtaş, Karahan, Tezören, Kıvanç, Ghaderi, Zohreh, Lensch, Hendrik

arXiv.org Artificial IntelligenceDec-26-2024

Self-supervised learning (SSL) has emerged as a promising paradigm in medical imaging, addressing the chronic challenge of limited labeled data in healthcare settings. While SSL has shown impressive results, existing studies in the medical domain are often limited in scope, focusing on specific datasets or modalities, or evaluating only isolated aspects of model performance. This fragmented evaluation approach poses a significant challenge, as models deployed in critical medical settings must not only achieve high accuracy but also demonstrate robust performance and generalizability across diverse datasets and varying conditions. To address this gap, we present a comprehensive evaluation of SSL methods within the medical domain, with a particular focus on robustness and generalizability. Using the MedMNIST dataset collection as a standardized benchmark, we evaluate 8 major SSL methods across 11 different medical datasets. Our study provides an in-depth analysis of model performance in both in-domain scenarios and the detection of out-of-distribution (OOD) samples, while exploring the effect of various initialization strategies, model architectures, and multi-domain pre-training. We further assess the generalizability of SSL methods through cross-dataset evaluations and the in-domain performance with varying label proportions (1%, 10%, and 100%) to simulate real-world scenarios with limited supervision. We hope this comprehensive benchmark helps practitioners and researchers make more informed decisions when applying SSL methods to medical applications.

artificial intelligence, initialization, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.19124

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.85)
Health & Medicine > Health Care Technology (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Hyperedge Anomaly Detection with Hypergraph Neural Network

Alam, Md. Tanvir, Ahmed, Chowdhury Farhan, Leung, Carson K.

arXiv.org Artificial IntelligenceDec-7-2024

Hypergraph is a data structure that enables us to model higher-order associations among data entities. Conventional graph-structured data can represent pairwise relationships only, whereas hypergraph enables us to associate any number of entities, which is essential in many real-life applications. Hypergraph learning algorithms have been well-studied for numerous problem settings, such as node classification, link prediction, etc. However, much less research has been conducted on anomaly detection from hypergraphs. Anomaly detection identifies events that deviate from the usual pattern and can be applied to hypergraphs to detect unusual higher-order associations. In this work, we propose an end-to-end hypergraph neural network-based model for identifying anomalous associations in a hypergraph. Our proposed algorithm operates in an unsupervised manner without requiring any labeled data. Extensive experimentation on several real-life datasets demonstrates the effectiveness of our model in detecting anomalous hyperedges.

data mining, hyperedge, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.05641

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
North America > Canada > Manitoba (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback